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Complete clear text representation of scientific 
documents in machi ne- readable form 

by 



B lan ton C . Duncan and David Garvin 
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Science and technology use a large variety of symbols to 
represent physical properties, chemical formulas and mathe- 
matical expressions. Data centers that codify and evaluate 
physical properties need to use this conventional symbolism 
in their work. It is recommended that these data centers 
adopt the symbols and terminology specified by the various 
International Unions both in manual operations and in the 
creation of machi ne -re adable data bases. 

It is demonstrated that these conventional symbols can 
be produced by modern communications devices that are 
compatible with the international standard codes for infor- 
mation interchange. A set of characters suitable for 
representing scientific data and text is presented and 
proposed as an extension of the IS6 information interchange 
code . 

The use of this extended character code by computer 
oriented data centers at the National Bureau of Standards 
is described. The equipment needed for this level of 
performance and criteria for their selection are outlined. 

ffey Words : graphic character sets; information analysis centers; 
information interchange codes; recording typewriters; scientific 
computer technology. 



The problem of codifying the results of scientific 
research has received increased emphasis in recent years. 
The taisk is immense. Data produced throughout the world 
must be assembled, analyzed by experts and the best possible 
results be made available to the ultimate user in a useful form. 
A promising approach which has received widespread attention 
is the establishment of a large number of data analysis centers, 
Q ch devoted to a specialty. As the number of centers grows 9 
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so will the need for them to trade information, often across 
national boundaries* The centers will also need to provide in* 
formation to remote users and do this quickly* This means that 
a substantial communications problem arises among a large nurabe 
of independent groups. 

Couoled to these problems of data analysis and communis 
cations is that of automation of data centers* The use of 
computers for data collection, reduction and analysis is 
widespread* Data centers need, in addition, text processing 
and file handling techniques suitable for the material they 
must collect, index, store and analyze* 

This paper treats a basic subjec t : th* recording of 
scientific data and text in an automated environment* It 
describes a x man -mach i ne' alphabet or symbolism for the 
interchange and processing of scientific data: the General 
Purpose Scientific Document Code* The needs of the 
human are met by providing a 3et of symbols suitable for 
producing the basic scientific document* - the typescript of a 
paper* The machine needs are met by associating this ^alphabet 
with the existing international standard system for information 
interchange* 

The orientation of this paper is toward human requirements* 
The machine is to serve man, not vice-versa. To this end, 
emphasis will be placed on the signs, or graphic characters, 
used in written communications, The control elements necessary 
for the machine manipulation of this man-machine alphabet will 
be treated lightly, except where it is essential that they be 
ment ioned • 

This work has been, and is exper i ment a I. The context in 
which it was done controlled many of the decisions that were 
made* This background is the interaction of several small, 
independently managed data analysis centers at the National 
Bureau of Standards ( NBS ) both among themselves and with a 
general service computer center* The data cen te rs and the 
computer services center have distinctly different problems* 
The data centers, on the basis of substantial experience 
operating in a non-automated mode, have specified their text 
handling requirements* These run far beyond the facilities 
for processing and printing commonly available in computer 
centers* 0ne example is the requirement for both capital and 
small Latin letters for representing the chemical elements* 
Another is the need for both superscripts and subscripts in 
mathematical, chemical, physical property and spectroscopic 
notation* The needs ot the data centers are consistent with, 
O^rid can be tested against the extensive work of international 
cientific organizations to define symbols, terminology and 



nomenclature* These standardized symbols are displayed and 
defined in SS0, IUPAP, and IUPA.C documents published during the 
1960's [ 1 ,2*3,4]. 

From the viewpoint of a general service computer center, 
these are highly specialized requirements of a minor group of 
its customers • By definition ( or by default ) t he majority of 
its existing custodiers can accept constraints imposed by key- 
punches and printers with limited character sets. However, the 
growing needs of data centers must be met* The facilities they 
require can be expected to have wider application, but a computer 
center cannot accept the responsibility to support an absolutely 
open-ended man- machine alphabet* A finite solution is needed* 

The systematic devel opment of a suitab le , f i ni te coded 
character set became practical in April 1965 with the publication 
of a proposed revision of the American National Standard Code 
for Information Interchange (ASCII) [5]* ASCII is an 
anticipatory standard* Even today the hardware and software 
of many general service computer centers are not designed to 
handle flows of data between man and machine at the levels 
of complexity anticipated by basic ASCII* But that standard, 
and the international one to which it is closely related, 
provide a carefully defined system within which the needs of 
science can be met* The standards can be used by computer 
centers for the planning of improvements in their services* 

Thus the standards developed for science and those developed 
for communications can be brought to bear on a solution* Kow d 
ever, standards are not enough* They must be implemented 
in hardware and software* The crucial parts are input 
devices to record the data, processing programs to accept, 
edit, reformat, retrieve, and store the data, and output 
machines to print clear, complete scientific text* 

An experimental processing system based on the concepts 
described in this paper has been in use, on a production 
basis, by the NBS data centers since 1967* This m field 
testing" and the on-going standardization activities in the 
communications field nave forced changes in detail but not 
in concept* 

Parts of this work have been described before* Others 
are summarized here for the first time* A prototype input* 
output device , the " taxywriter" , was developed to demonstrate 
the feasibility of the system [6]* Although superseded 
by commercially available instruments, it remains in service* 
The first version of this scientific men -mac hi ne alphabet 
was described in 1968 [7]* Comments received and 
further study led to a revised set of graphi cs* These 
were incorporated in a line printer installed at NBS [8]* 
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From the human point of view this printer is the most 
important development. It can display fully a scientific 
typescript* An allied development is the use of photo- 
composition for final output* Surprisingly enough, this 



rich to provide the prin te d results expected by the scientist* 

The sections that follow treat a variety of topics* The 
objective is an overview of the subject. Concepts and cri- 
teria are emphasized at the expense of detail. The symbolism 
of science is displayed (Section 2 K An introduction to the 
communications codes is provided. How these may be extended 
to meet scientific needs is explained (section 3). A 
specific extension, the General Purpose Scientific Document 
Code, is described together with examples of its use (section 
4). Criteria for selection of input devices and the design 
of printers are discussed in so far n.a \hey bear on the 
man-machine interface. 

It is appropriate to state here the conlusions that we 
have drawn from this work, or, if you will, display our 
biases. These are: 

( a ) The currently used complex terminology and symbols 
of science and technology are needed to represent the 
wide range of properties thai' are measured and the 
theories that interpret them. 

(b) Data centers will need to use this symbolism in 
their internal work and in communication with others. 
In the present state of the art, automation equipment 
and computer techniques present no insuperable barriers to 
the use of this complex symbolism. 

( c ) Techniques for automated handling of scientific 
text can be developed within the context of inter- 
national standards for information interchange. 
Cooperative development is possible on this basis. 

(d) General concepts and criteria for text handling 
can be developed for the design of equipment and im- 
plementation of operating systems. 

( e ) The human factor, not the machine, should control 
the development. Ease of operation and flexibility 
must have first priority. 



system, although designed for 




is sufficiently 
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(f) A system must be expandable to meet future 
demands. The present is always an approximation. 



The Symbols of Science 



With what type of material must one contend in scientific 
commun icat ion? The det erminat ion must take i nto account the 
formal recoramendat ions in standards documents [ 1 , 4 ], pub I i- 
cation practices, and manuals of style [9, 10,. ll]. The 
"IUPAC Manual for Symbols and Terminology for Physicochemical 
Quantities" [4] displays the formal recommendations very well. 
The other standard documents enlarge the set n lightly. Publi - 
cation practices have been sampled by examining publications 
of the American Chemical Society, the Association for Computing 
Machinery, the American Institute of Physics, the American 
Association for the Advancement of Science and the National 
Bureau of Standards. Among the style manuals, the "Handbook 
for Authors" of the American Chemical Society is particularly 
important [ 1 1 ]. It devotes considerable attention to the 
preparation of a scientific typescript. It gives many 
illustrations of the thesis developed below. 

2.) Information Cpptent of a scientific article . Each 
article in a technical Journal reaches the editor and the 
technical reviewers in the form of a typescript. This type- 
written copy had ( or should have had ) all the symbols, 
equations, chemical formulas, etc. in a form readily 
recognizable by the typesetter. This leads to a general 
rule : 

The copy that can be produced by a scientific 
typist contains all the information that appears 
in written scientific communications. The 
minimum acceptable level of performance of a 
text -handling system for science is full repro- 
duction of a scientific typescript. 

2.2 Classes of symbols needed . The IUPAC definitions 
for the display of physicochemical quantities show that the 
complete Latin and Greek alphabets both in capital and small 
letters are necessary. A variety of special signs are needed 
to denote operations, relations, etc. Special relationships 
among symbols (such as the use of superscripts and subscripts) 
are prescribed. These have high information content. 




The IUPAC definitions also show that typescript notations 
are required to supply stylistic information for the type* 
setter in order to permit the following rules to be imple- 
mented. 



Upright i roman ) type is used for chemical formulas 
(section r «2, ref 4 ), units (section 3.1) and 
mathematical operators (section 6). 

Slanting ( italic ) type is used for symbols for 
physical proper-ties. These symbols are letters 
of the Latin and Greek alphabets (section 1.5). 
Vector Quantities art printed in heavy (bold-face) 
slanting type (section 1,5!, 

Superscripts and subscripts which are themselves 
symbols ±or physical properties are printed in 
slanting type. All others are printed in upright 
ty fc e (section 1.6). 



Typescript notations to meet the requirements of these 
rules are discussed in section 4.3 of this paper. 

The scientific typist needs two other classes of symbols. 
The first is a set of line segments for ruling tables, repre- 
senting structures of organic chemicals, producing flow 
chart 8, etc. The second is a miscellany of marks ordinarily 
used in preparing text, such as punctuation, currency symbols 
and diacritical marks. 

An automated system can usefully provide an additional 
feature. This is a set of dots for plotting. These are 
more likely to be used with computer generated data than by 
a typist. 

Figure 1 shows a set of symbols useful for science. 
Figure 2 shows several diagrams subject to being keyboarded. 

2^3 Scientific yommunica t |on . It has been a difficult 
task to define symbols and terminology for science. This 
task has occupied committees of the various scientific 
unions for decades. These committees have been successful 



- Figure 1 is a photographic reduction of original copy pro- 
duced at 3:1 scale on a computer driven incremental plotter 
using output from a typewriter simulator program* This desi 
tool is discussed further in Section 4.8. 




SCIENTIFIC NOTATION 



Latin and Greek Alphabets 86 symbols 

ABCDEFGHI JKLMNOPQRSTUVWXYZ 
abcde f g h i J klmnopqrs tuvwxyz 

a * r ha n e i t n * 

a B 8i(t>yr)ii,K\uv n 0 p o r v u \ M 

Arabic Numerals 10 symbols 

1234567890 

Mathematical Operators and Miscellaneous -60 symbols 

( ) * [ ]"**dvii*£ nBflBe::3 nu«-<l§t i 3 - 0 # 
"-&$[]"%* 

Line Segments and Dots for Diagrams 20 symbols 

1 • • . . ^ ^ - ^ 1 • / / \ \ / ^ \ ^ I 



USAGE: Letters may be upright (roman) and sloping 
(italic) in light or heavy (bold) type. All 
symbols may be used as superscripts and subscripts, 
Scientific typescripts reflect this usage, 



14 N+ C fi | = £-1 AHf° A(NO-) 



Figure \' Illustration of the notation employed in 
scientific typescripts when the typist 
has an adequate set of auxiliary type 
charac ters for the typewr iter. 
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Figure 2 

Diagrams * An electrical circuit diagram, chemical formulas, 
mathematical display equations, a graph and a block diagram. 
The first three (above) were produced as output on a computer 
driven line printer. The remainder (following two pages) were 
produced on a teletypewriter operating under punched paper 
tape control. 
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1)3 Designation of 
Sets of Graphics 



Repertory of Graphics 
Encoded with Multiple 
Bytes, 79 Sets 



Repertory of Graphics 
Encoded Normally, 
158 Sets 



Designation of Sets of Graphics: 

DO = ZS£ 2/8 or 2/12 (F) Dl = £££ 2/9 or 2/15 (F) 
D2 =: E5C 2/10 or 2/14 (F) D3 = ES£ 2/11 or 2/15 (F) 
D4 = SSC 2/4 (F) 

Figure 2. (concluded) 
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in providing a carefully defined comprehensive notation* It 
should be used* 

Data centers should use this notation simply because it 
is their mission to codify the results of scientific investi- 
gation. It is reasonable to expect them to lead the way in 
standardizing the transfer of scientific information* 
Specialists in automation and information exchange must 
also consider these notational schemes* They are existing* 
not projected 0 procedures. They show demands that may be 
expected to i ncrease in intensity in the future* 

2*4 Summary , Science and technology transf e r information 
using systems of notation that are far more complex than 
those required for newspapers, magazines and telegrams* 0n 
the other hand, all except the last of these common media 
for information interchange use symbolisms that are beyond 
the de facto standard output provided by common computer 
printers* The next section shows that the common computer 
facilities are sub-standard. 



3m The ISO Code for Information Processing Interchange 

The International Organization for Standardization Code 
for Information Processing Interchange, ISO R646, provides 
for and is already widely used in telecommunications 
[12, 13]* The American National Standard Code (ASCII-1968) 
is a proper variant [5, 14]* These codes and the doctrines 
for their use are still under development* Revisions 
are being considered, mainly to clarify matters of national 
use and the "national option" positions* But none of the 
proposed revisions would change the basic system upon which 
the work reported here rests* 

Today, the ISO Code is a 7-bit code (128 patterns) with 
33 control functions, "space" and 94 (visible) printing 
characters* The control functions provide for communication 
needs : " enquire" , " acknowledge" , "end of t ransmi ssion" , 
etc; represent typewriter operations: "line feed", 
"carriage return", "horizontal tabulation", "backspace", 
etc; and include a few information markers: file, group, 
record and unit separators* Each control function is care- 
fully defined* We have found it possible to translate 
unambiguously into the standard those important control 
q features of typewriters that employ other code schemes* 
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The 94 graphics characters of the IS6 code are a slight 
expansion of sets normally found on typewriters for the Latin 
alphabet. The set includes the capital and small Latin letters, 
numerals, punctuation marks, mathematical operators and a 
variety of Special symbols. 

The IS6 code is shown in the left hand side of Figure 3a 
starting with the character NUL and continuing through the char- 
acter DEL in the columns labeled 0 through 7. It is suitable for 
simple text. Since the bulk of most scientific papers is 
simple text, the basic facilities of the ISO code cannot 
be spared. It is not possible to replace any usefully 
large number of symbols of the basic set with ones more 
useful for scientific work. 



A family of codeff related to the IS6 Information 
Processin g Code , All are displayed in the same 
arrangement* Columns 0*7 have the basic 7-bit 
code, and columns 8*15 have extensions* These 
two regions are arranged in the same manner: 
columns 0 and 1(8 and 9) are for controls, 
columns 2*7 (10*15) contain graphic symbols, 
a* Japanese Industrial Standard Code C 15 ]• The 

extension ( columns 10-1 3 ) provides Katakana 

characters as an alternative set, 

b, USSR Alpha-Numeric Code [16]. The extension 
repeats columns 2 and 3 in columns 10 and 

11* Cyrillic characters, columns 12-15 are 
arranged to match their Latin equivalents 
in columns 4-7 where possible (but with 
capital Cyrillic overlaying small Latin 
letters )• 

c, American National Standard Code for Infor- 
mation Interchange (ASCII) and an extension 
for library use. Columns 2-7 give the basic 
ASCI I set £ 14 ], The extension provides a 
large collection of diacritical marks and some 
infreqntly used letters. This combined set 

is used as an 8-bit code by the U,S. Library 
of Congress [17], 
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3.1 Add! t gJE new graphi c symbols ,, In recent years 
there has been considerable standards activity devoted to 
extending this code vyhile retaining the philosophy that 
underlies its construction and control features* New 
control features and new sets of graphics have been intro- 
duced » The r ight hand side of Figure 3a Is an example of 

new graphics [15]. This set includes the Katakana characters, 
i.e. a Japanese syllabic script* The set illustrates the 
principle: a new set of up to 94 graphics is introduced* 
Members of this set are invoked in the Japanese Industrial 
Standard 7*bit code by using an existing control, Shift- 
Out (S0). Shift-In (SI) restores the standard set. At 
the present time techniques are being developed to permit 
inclusion of larger sets of characters, such as that needed 
to represent (more specifically, to encipher) Kanji (liter- 
ally, "Chinese* 1 ) by using two standard graphics to represent a 
single Xanji character* 

Figure 3b shows another language extension, that for 
the Cyrillic alphabet, taken from the national standard 
for the USSR [16 ]• This illustrates a useful principle* 
Wherever possible the corresponding Cyrillic and Latin 
letters occupy corresponding positions in the table. Thus 
a rough idea of a Cyrillic or Latin text can be obtained 
from the output of a machine that can print only one side 
of the table. 

Figure 3c shows another example of extension. This 
introduces a set of graphics desired by librarians. The 
U.S. Library of Congress uses this code to distribute 
bibliographic information on magnetic tape to an inter- 
national clientele [17]. 6nly 56 symbols are added. 
Diacritical marks are emphasized. 

3.2 Addi t ion o± new controls . Means of extending the 
repertory of controls are also under development. The 
means anticipate the use of two techniques. Figure 3a, b 
and c illustrate one technique, that of adding to the "width" 
of a code to secure the 256 characters of an 8-bit code. 

In an 8-bit code columns 10 through 15 (Excluding the 
positions 10/0 and 15/15) are reserved for graphic charac- 
ters.* Columns 8 and 9 are controls. Thus an 8-bit code 
provides for the addition of 34 controls to those of the 
basic standard code. 



-*Here and in later sections of this paper, the positions 
O In code tables are designated by coordinates: column/row 
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The other technique is the use of a special code ex- 
tension character, the meaning of which depends on how it 
is used. In the 7-bit code this is Escape (ESC). Escape 
is used as the first character of a sequence of two or more 
characters which, as a unit, represent a single control 
function* 

The doctrines being developed for code extension provide 
for various additional Levels of complexity. Our concern 
is only with the first level of extension, where the 
doctrines are already being implemented in hardware and 
software • For example, t he import an t f unc ti ons of Half 
Line Feed Forward, Half Line Feed Reverse, Reverse Line Feed, 
Clear Horizontal Tabulation Stops and Set Horizontal Tabu- 
lation Stops are available on stock teletypewriters [ 1 8 ] 
where they are represented t* r two-character Escape sequences. 
None of these controls is in the basic ISO set. 



standardization work on code extension envisions a family 
of 8-bit codes and extended 7-bit codes, in which each 
member of the family retains the facilities of the basic 
7-bit code as a subset [ 1 3 ]. (This is similar to the 
treatment of the keypunch and the common computer printer* 
They are contained in and defined as a subset of the basic 
7-bit code [ i o] ). 

This con rc of adding levels of complexity while 
retaining stanJard subsets is crucial to the orderly develop- 
ment of general purpose facilities over a long period of time. 
It is used today to assure a high level of correspondence 
among national codes. For example, ASCII (columns 2 through 
7 in Figure 3c) differs from the Japanese standard code (same 
columns in Figure 3a) at only three positions: 5/12, 7/12, 
and 7/1 4 . The variat ions in Figure 3b also are mi nor* 
Thus the basic ability to communicate among systems support- 
ing the three codes shown (figures 3a-c) is assured* For 
example, virtually all of the text of this paper could be 
printed by any of them* 

The examples of code extension discussed above emphasize 
enlargement of the graphic character repertory* Clamons [ 20 ] has 
published a summary of current work on character codes which in- 
cludes a summary of proposals for extensions of the control reper- 
tory with some emphasis upon extensions intended to be useful in 
cat ode ray display devices* Clamons' paper includes a multi- 
colored chart which provides a very convenient summary of the 
inter-rela* ionships among ASCII structured character odes* 



3.3 Code ext ens ion - general r emarks . Current 
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4. A General Purpose Scientific Document Code (GPSDC) 



Code extension, discussed in Section 3. has been used to 
adapt the information interchange standard to the needs of 
science. In effect, this converts the 7-bit code to an 8-bit 
code and provides for up to 34 new controls and 94 new graphic 
characters. 

4jJL The extended code . The set of characters proposed 
here for use in documentation of scientific work is shown 
in Figure 4. This proposal is a substantial revision of that 
originally made [7] and is slightly different from the set 
ot graphics initially selected for installation on a printer 
at NBS (8]« The revisions and rearrangements have been made 
on the basis of experience with the earlier set, certain 
standards developments and comments received. They have been 
made in the hope that they will increase the utility of the 
code . 

These 189 characters -form a set of "primitive graphics* 
that are to be realized in hardware. The needs of science are 
not met completely by these. They must be supplemented by the 
techniques described in the following three subsections. 

The best approximation of this set realized to date is 
shown in Figure 5. This is a reproduction of the actual com- 
puter printed code table used in documentation describing the 
facilities of the NBS Computer Services Center used to produce 



Figur e k 

General P urpose Scientific Document Code * This is an extension of 
the ISO Code to meet the needs of scientific text. Columns 0-7 
have the basic 7-bit code and columns 8-15 the extension. Compare 
with Figures 3a, 3b and 3c. The new controls, in columns 8-9, are: 
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New Line 
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Execute Control with Parameter 
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Reverse Line Feed 
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Half Line feed Reverse 


HLF 


Half Line feed Forward 
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Extended Shift Out 


ESI 


Extended Shift In 
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Accept Control Parameter 


AGQ 


Accept Graphic Qualifier 


TCG 


Terminate Composite Graphic 


DTI 


Device Test 1 


DT2 


Device Test 2 


DT3 


Device Test 3 


r* 


reserved for future assignment 
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Defining Representation - 1972 November 10 
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The design standard for GPSDC specifies monovidth spacing in metric 
units for the basic set of 189 typewriter-like printing characters. 
Character depth (vertical line spacing) is 4mm and character set width 
(horizontal spacing) Is 2mm. The defining representation implies 
digi taliza tion on a dot matrix with dots spaced 0.05mm on centers and 
printing using dots with a fixed diameter lying between 0.1 and 
0.15mnw The characters slant, 2/ 15 , and long dash, 10/14, are, 
respectively, a full diagonal and a full horizontal stroke in the 
print window used to define the basic set. 
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NBS Computer Services 

GPSDC Code Table 



1972 Jul 7 
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Rendition set up for IBM 1416 Train Cartridge, Order # G 83154-55, 

for use on an IBM 1403-N1 Printer with features: (1 ) Universal 

Character Set, (2) Wide Hammers and (3) 16 Lines per Inch Spacing* 
See: NBS Tech. News Bull. 54 No. 2, 35 (Feb. 1970) 

PI* MSP, "meta space", use restricted, ESC 3/0 in 7-bits 

E6, "Eight 6nes", use restricted, ESC 3/15 in 7-bits 

r*, controls reserved for future assignment 

Graphics not on this printer 

aps, aoostrophe, 10/7 prints instead 

be, bracket cap 

bf, bracket foot 
csm, currency sign 

obl - ob8, "oct obliques , See: Gottardi, J. Chera. Doc. 10 75 (1970) 

Graphics not on this printer as simple symbols but that can 
be constructed 

Phi, Greek * 
phi, Greek 

pst, pound sterling sign, -L 
the, Greek 8 



ERIC 



Figure 5. Approximation of the GPSDC table using an existing high 
speed line printer. 
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the camera-ready copy for this Technical Note* The character at 
position 10/0, Meta Space, is considered a control not freely 
available to the user as a notational device* The character is 
discussed further in Section 4.9. 

4j_2 I nternal Extension - Composl te Symbols . Figure 6 
displays examples of "composite" characters rendered by two 
different printing mechanisms* Composites are important be- 
cause they provide an "int ernal extension" to obtain add it ional 
characters • 

Composites are overstrike combinations of two symbols, such 
as combining a "less than" («) and "low dash" (-) to make "less 
than or equal" (*)• This technique is commonly used for 
accented letters. Here it is exploited mors fully* Any pair 
of symbols can be combined* The combination should have an 
easily recognized meaning. The basic ISO code document allows 
for the use of this technique of combining two symbols in one 
location to create a symbol with a different meaning. "Back- 
space" is commonly used in this technique. Figure 6 shows only 
a partial listing of the composite symbols presently being used 
in GPSDC» By implication it indicates our practice of cataloging 
composites in sets of 94 to allow for possible future develop- 
ments which would treat our composite symbols as primitive 
symbols in alternative sets of graphic characters to be invoked 
using higher level code extension techniques [13]* 

4 j, 3 Clasg Mod if icq t i ons * Another "internal extension" is 
class modification* Seven class modifications are specified* 
These at-e produced by underscoring and/or overscoring with 
dashes, waves, arrows and dots. These modifications apply 
to fi i I the symbols in the set, including the composites. 
The meaning of a particular modification is not defined. 
However, the boos t common use to date has been to indicate 
several type faces. Class modification is the typescript 
notation which provides the implementation of the stylistic 
requirements, i*e* italic, bold face, etc., of the IUPAC 
rules ci ted in section 2 • 

4.4 Control Functions . The addition of two control 
functions completes this code for science. These controls 
are "Half Line Feed Forward" and "Half Line Feed Reverse". 
These are vertical motions on a page. They permit placement 
of superscripts and subscripts. As previously described, 
these have already been introduced by some manuf ac turers in the 
first level of extension of the control set for the standard 
code by use of escape sequences. 
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FIGURE £ 

Compos 1 te symbols . Examples of new characters formed by 
overstrikine pairs of characters from Figures 4 and 5. 

a. Composites drawn by a computer driven incremental 
plotter (see Section 4.8) using the preferred 2:1 
print window, Figure 4 characters. 

b. Composites as produced on the NBS line printer 
showing the distortion caused by adoption of a 
1.25:1 print window, Figure 5 characters. 
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4* 5 Selec tion of graphic symbols « The general basis for 
symbol selection has been described in section 2. The required 
techniques of using composite symbols and class modification 
are explained earlier in section 4* A detailed defense of 
each symbol is inappropriate here* The objective is to provide 
a set in which there exists one form for each required symbol, 
not the entire range of alternatives that could be used* 
It is expected that the scientist who uses this system will 
have to make some compromises* He does this now: 
he must work within the set of symbols available to his 
typist* We are confident that the code provides reasonable 
solutions to all but the most abstruse problems of preparing a 
typescript* 

4*6 Options in the IS6 C ode t ind in ASCII . Practical ex- 
perience suggests that a graphic character set can be se- 
lected which will be found to be serviceable in non- sci ent if i c 
applications, or at least in technical applications not origi- 
nally contemplated. Our original emphasis on recording 
scientific text may be unwarranted* This possibility is im- 
portant to our computer center and probably to others* 

For this reason, certain non-scientific symbols have 
been included* The approach has been to incorporate sym- 
bols that may have alternative forms in the IS6 and ASCII 
codes. Thus, the extension includes the Pound Sterling 
and the General Currency Symbol which «ire, in the 
IS6 code, proposed alternatives for the* Number Sign (#) and 
the Dollar Sign ($) in ASCII* A solid Vertical Line has been 
substituted for the broken Vertical Line at 7/12 (Fig* 3c) in 
anticipation of a revision of ASCII to correspond to international 
practice* 

4.7 Pi ait rams * A more important consideration is the 
provision of an adequate set of rule segments for diagrams 
and chemical structures* The set proposed by Gottardi [21 ] 
for chemical structures has been included in t oto * This set 
is, in our opir ion, equally useful for many cl asses of dia- 
grams* It should be considered very seriously by equipment 
designers* The set of rule segments andl plotting dots is 
collected in columns 4' and 5' of the Shift 6ut set in Figure 7* 
Taken alone they form a reasonable extension of the basic ASCII 
set and could be realized on a Model 37 "Teletype" ( which can 
print 126 symbols)* 
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4.8 Proposed S tandard Print Window . The design of 
a Bet of graphics suitable for the formation of composite 
symbols and for the construction of diagrams requires the 
consideration of compatibility in connection with almost 
every member of the set* In the course of this work it 
became clear that we needed to adopt some concept of a 
standard "print window" realizable in hardware* A print 
window is the rectangular space within which a graphic 
character is placed* For mono-width character sets of 
the same size or font all symbols have the same print 
window, although many of them occupy only a portion of it* 
( Indeed, the space around a symbol is an important element 
in its design*) For typewriting on single spacing, the 
print windows fill a page* For two adjacent windows on a 
line the left edge of the second is the right edge of the 
first* For two adjacent windows in a column the top edge 
of the lower is the bottom edge of the upper* When sub- 
scripts and superscripts are produced by half-line spacing, 
print windows may overlap vertically* 

At present, the existing hardware (line printers, displays, 
typewriters) shows considerable variation in character and 
interline spacing* No one print window is applicable to 
all* Different compromises must be made in the design of 
a set of graphics* For example, existing computer line 
printers employ the spacing used on the relatively uncommon 
"pica" typewriters where the horizontal spacing is 10 
characters per inch and tho vertical spacing is 6 lines 
per inch* In contrast, Gottardi achieved rational slopes 
for rule segments by having his printer modified to space 
10 half lines per inch* In his implementation a square 
print window is used* 

€>ur r ecomaenda t ion is neither of these* It is a 
print window with an aspect ratio of 2:1. It is further 
specified that graphic characters may extend over the 
entire height and width of the print window, although 



FIGURE 7 

Afi extended code set * One hundred twenty-six distinct 
characters* Columns 2' and 3' repeat columns 2 and 3* 
The extension in columns 4' and 5' emphasize plotting 
dots and rule segments* Figure produced directly on a 
Model 37 "Teletype". 
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most of them do not* This 2:1 print window meets the 
requirements for readable mono-width characters: It has 
the same relative spacing as the common "elite" typewriter 
(12 characters per inch horizontally, and 6 lines or 12 
half lines per inch vertically). It also permits the use 
of the set of rule segments specified by Gottardi. 

0n typewriters one can secure printing ever a full print 
window so specified. However, this is not possible on any 
existing computer line printer we have examined. For example, 
in the case of the printer used to produce the copy for Figure 5 
and the bulk of the copy for this Note the maximum permitted 
vertical extent of a graphic character is 0.137 inch. Thus, in 
adapting the GPSDC graphic character set to this printer we 
introduced distortion by using a print window appropriate to 8 
lines per inch vertical spacing (1/16 inch half line space) and 
10 characters per inch horizontal spacing. The aspect ratio is 
1.25:1. The resulting distortion is of no particular conse- 
quence in ordinary text. However, a comparison of the two 
renderings of composite symbols in Figure 6 shows that cramping 
in the vertical direction has led to printed composite symbols 
which are not Quite satisfactory. In addition, in carrying out 
the distortion of symbols we altered the slopes of the 
rule segments and the positions of the plotting dots. 

Figure 2 suggests that this distortion will be of little 
consequence in rendering some classes of diagrams. However, 
it will be important in scale drawings such as Figure 8. We 
feel that this compromise with existing hardware is a tempo* 
rary expedient. The dimensional specifications of line 
printers can be changed in future models If manufacturers so 
choose. In any case alternatives to the common line printer 
mechanism are available and are coming into more common use. 

A par t of our current program of work, is the i mplemen- 
tation of a convenient scheme for simulating a document 
writer using a computer driven incremental plotter. The 
driver programs make use of a catalog of digitalized graphic 
characters. This effort is intended to produce a design 
tool which can produce individual large scale drawings 
of graphics of the kind suitable for documenting specifications 
while, at the same time, being able to produce smaller 
scale text so as to test the compatibility of the graphics. 
In planning this portion of the work we have anticipated 
certain advantages in having a catalog of symbol specifica- 
tions which can be rendered on a document writer providing 




only the printing facilities of GPSDC. Thus, we have chosen 
to develop binary specifications of symbols as illustrated 
in Figure 8. We describe these specifications as "binary* 
because the symbol specification is rendered at large scale 
by a dot - no dot scheme using the large centered dot at 
position 11/0 in the code table* 

This part of the work is in large measure an experiment to 
help us better understand the implications of "registration" of 
coded character sets* The notion of "registration" as an element 
of the standardization process has been invoked at the USA 
Federal [22] and the I S0 [23] levels* This whole paper, not 
only the particular facility being discussed at this point, 
speaks to the mechanisms of analysis, evaluation and documenta- 
tion of the kind with which, it seems to us* any "registration 
authority" would be concerned* 

It is not possible here to give an account of all the 
background considerations which led to the symbol cataloging 
scheme chosen* Basically, we chose a compromise working 
scheme which gives definitions somewhat more coarse than that 
required to carry the full artistic burden of graphic arts 
typography [24,25]* fln the other hand the specification is 
Judged to be sufficiently fine grained to carry those ele- 
ments of distinctiveness required for symbol recognition on 
the part any human reading for content* In addition, when 
prepared on a typewriter. Figure 8 has a scale of approxi- 
mately 40:1* We have prepared input specification forms at 
this same scale; it is a convenient one with which to work* 

At this time it does not appear to be possible to define 
a "best" choice for the absolute dimensions of Figure 8* For 
the time being we are suggesting that the horizontal single 
space shown in Figure 8 be taken 2 millimeters* This sug- 
gestion takes into account the effort to progress toward a 
metric scheme for measuring typefaces as exemplified by a re- 
cently adopted British standard [26]* At present we are cata- 
loging what could be described as typewriter style graphic 
characters designed for monowidth spacing where the single 
line spacing ("character depth" in the typographic standard 
cited) is twice the character width* In Figure 8 the marks 
labeled A, B, C and D indicate, respectively, the tops of 
large letters, the tops of small letters without ascenders, 
the base line for letters and the bottom of descenders on 
those letters where they occur* 




4.9 The M eta Space Character . When a user interacts 
with a system through the interchange of printed text, 
there is almost invariably a need to mix data text with 
control text* There are many uses for visible flags and 
delimiters* There is sometimes contention between systems 
designers and users over the matter of reserved symbols* 
The character Meta Space was introduced into the GPSDC 
character eet to aid in alleviating the problem of re* 
served symbols* Syst em i m pie men tors are encouraged t o 
consider the following guidelines in devising uses for Meta 
Space* 

(1 ) When Meta Space occurs at a print position by 
itself , i*e* not as part of a composite, it may be 
treated as ordinary space or, perhaps, H required 
space" to indicate that a textual element including 
space is not to be broken in adjusting line lengths* 
In manuscripts for processing by a typesetting pro* 
gram it may be used to represent "em space". 

(2) When the need arises to define a visible control 
o? delimiter, Meta Space should be one of the com* 
ponents of a composite symbol assigned the required 
meaning* For example, in one processing system used 
at NBS C Meta Space plus Exclamat ion Point is used 
as a Cc Am and Prefix* In this same system Meta Space 
overs truck with a Question Mark is used as a location* 
specific diagnostic error signal in system generated 
printing, 

( 3 ) Output options should be used to control the 
printing or suppression of Meta Space, composite 
symbols involving Meta Space or commands delimited 
through the use of Meta Space* 

(4) The system designer must specify each proper use 
of the Meta Space character* ( 1 88 pr i nting characters 
are available for recording data, only this one is 
reserved for control purposes* ) 

4 . 1 0 Keystrokes versus Character s. In designing GPSDC 

we have take n i nto account the devel opmen ts wh ich are lead ing 
to economical solid-state logical components associated with 
keyboards* We think, for example, keyboard operators will 
not vcontinue to Wave to use two keystrokes to represent a 
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"New Line" through the successive actions of Line Feed ( LF ) 
and Carriage Return <CR)# We expect modern keyboards which 
emit standard codes to be augmented by character sequence 
generators which will emit the proper sequences of characters 
to represent frequently used functions. As a consequence we 
have not, in general , equated "number of characters" with 
"number of keystrokes" in deciding which control functions 
or symbols should be assigned as single characters in the 
GPSDC code table. 



The input process and the des irable charac ter ist ics of 
input machines are described in this section. The discussion 
is limited to the preparation of copy on typewriters. We have 
not made a sufficient examination of cathode ray display 
devices. 

S.t Prepap atjoji of cop y - the operator^ view . The 
basic mode of capture of text should be as similar to ordinary 
typewriting as possible. The operator should be & scientific 
typist, not a compositor. This reduces the need for special 
training. 

Free-form copy must be acceptable. It should be possible 
to produce a page of copy that is exactly the same as the 
final copy of the typescript of a scientific paper. When an 
ideal machine is used this means that all symbols Trill appear 
on the page clearly readable, i.e., with no ciphers, in their 
proper positions, and without the introduction of visible 
control information. 

An input system may allow for highly stylized typing of 
fragments of copy followed immediately by reformatting. This 
is very useful in an interactive system. It should be an 
addition to, not a substitute for, the basic free-form input 
mode . 

5a 2 PrgP*r3tjop o£ cops - pftSfalqe processing requirement^ . 

It is assumed that the record produced by the input device 
will be processed by a computer program. This is desirable 
because the process can eliminate many restrictions on the 
typing. Records can be cleaned up to correct errors recog- 
nized by the typist. Also the string of codes representing 
a li ne of text can be put into a standardized sequence that 
will simplify later processing, particularly information 
O etrieval. Computer processing will be necessary if the 



5 • Preparation of copy and input devices 
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input machine produces a code sequence other than ISO. This 
translation should be accepted as a normal procedure* 

Tt is mandatory that each keyboard action be encoded 
in the record produced, These keyboard actions should be 
sufficient to produce the full record* 

5*3 Specific Features of typewri ter -l jke input machines . 
The impo rtant fea tur es for an input devic e are ; 



( 1 ) A basic set of 80-94 graphic characters and the 
normal set of controls that affect the pos it ion of 
text on a page, e • gt* line feed, carria ge return, 
space and tabulate* 

( 2 ) 6verstrike capability, e*g* backspace* 

(3) Half line feed forward (down) and Half line feed 
reverse ( up ) • 

(4) Access to an alternative set of graphic characters 
and control codes for, them* 



These features are listed in order of their importance* In 
almost every case selection of an input device will require 
some compromise* There is no device that matches 6PSDC 
perfectly, but good approximations can be found* 

The b frplc set of graphic character ^ . That set normally 
supplied by manufacturers is acceptable but usually inade- 
quate, If a choice is available, maximize the number of 
those characters used frequently or those in the GPSDC set* 

Overstjrike capability . This is useful for three purposes: 
correction of errors (by substitution), class modification and 
cons true t ion of composites* Since these composites may be 
ciphers for non-available characters, overstrlking is a very 
powerful method for extending: a basic set of characters* 
Both corrections and composites are to be interpreted by the 
processing program* Whenever possible, the input keyboard 
operator should not be required to observe a prescribed se- 
quence of operations in over striking * 

Ha lf I ine feed controls . These permit clean encoding 
of superscripts and subscripts and make the entire set of 
characters available in thes£ positions* They are also needed 
^ .or diagrams* If a machine does not have these controls. 
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super 
or be 
poor. 



- and subscripts 
included in the 



must either 
basic set* 



be ciphered as composites, 
Both of these choices are 



A lternative set pf graphics . Access to a second set of 
graphics permits clean text preparation for more complex 
material* At times substitution of a second set is easy, 
e.g. when typespheres are used* But even when the physical 
act is difficult, the controls to invoke the alternate set 
of graphics are useful* They should be of the type "Select 
set X", i.e. shift and lock, or "Select :he next character 
from set X", i.e. non-locking shift* The use of a single 
control to alternate between two sets of graphics should be 
avoided* The operator has no reliable method for restoring 
a basic condition in case of error* 

The choice of the alternative set should be made to 
maximize the number of GPSDC characters* 

5.4 Experience at NBS . The data centers that use the 
GPSDC system have a variety of recording typewriters* They 
differ considerably in their features and ease of operation* 
The machines in use are identified below* The specific 
features (section 5*3} that each has is shown in braces* 
Friden "F lexowr i te r" (circa 1961), f 1 • 2}, SCM (CDC) " Type tronic" 
{1, 2}, "Taxy writer" {l , 2, 4}, Dura (itel) Model 1041 {l, 2, 3, 
4}. IBM "MTST" {1, 2, 3, 4?}, IBM "MCST" {l , 2, 3, 4?}, "Model 
37 "Teletype" [l , 2, 3, 4}* Other models of these machines may 
have more features* We do not expect to have difficulty in 
handling input from any typewriter that records on a standard 
medium* 

Selection of sets of graphic symbols has proved easiest 
for the typewriters that use typespheres, e.g., the "Selec- 
tric" typewriters, and for the "Teletypes** * This is simply 
because the former provides 150*260 useful symbols with two 
stock spheres and the latter can provide 126 characters in 
its typebox* 

Explicit control features for selecting alternative 
graphic sets are not available on most of the machines 
mentioned above* The practice has been to devote some unused 
peripheral control function to this purpose* "Red ribbon 
shift" is ideal for this purpose and has been used to advant- 
age on "Teletypes" to indicate Greek, etc* 

6vers tr iking, part icular ly for corrections, is so widely 
used that a machine without this capability would be unac- 
ceptable in the data centers* 



The use of keypunches to produce scientific text must 
be mentioned* A complex coding scheme that makes a keypunch 
simulate a typewriter {l , 2, 3, 4} was developed as an 
emergency measure* To our surprise, this has been used 
extensively, particularly in the editing of records* 



6x1 Implem ent &t^pn«. a text-handling system based on 
the General Purpose Scientific Document Code has been in 
operation since 1967* It is an experimental system* There 
has been a constant need to refine the techniques used and to 
improye the definitions of the system* The experimental nature 
of the system has not daunted its users* They have employed 
it for day to day product ion, and often have used it for tasks 
not originally contemplated* They have written substantial, 
programs for special applications* 

The programs for the system are written in FORTRAN* They 
are used in batch mode on the NBS Univac 1108 under EXEC II and 
EXEC 8 and previously ran on a CDC 3 100* The program deck 
totals about 25,000 cards* Batch mode is employed, not out of 
preference but because only it is available* Many of the pro- 
grams would be applicable to an interactive mode* 

The basic system provides for input from recording type- 
writers, keypunches and from magnetic tapes prepared either in 
ASCII 1968 or by on-line text processing systems, such as the 
IBM ATS* There are programs for editing, reformatting, search 
and retrieval and for output to a line printer, a photocomposition 
machine and to ASCII 1968 on magnetic tape* 

6 *2 Users and ext en t of use * Three data centers regularly 
use GPSDC to prepare their files of information about their 
specialties; thermochemistry, chemical kinetics and atomic 
spectroscopy* Other groups have used the system to prepare books* 
These users take advantage of two features: access to a line 
printer for proof copy and access to a high speed photo-compo- 
sition machine* The subject disciplines are statistics, diatomic 
spectra, molecular structure, analytical chemistry, radiation 
chemistry and crystal structure* 

The total usage of the system is small when measured 
against other automation* During the 12 months ending June 1970, 
about 100 Jobs per month were logged in by the NBS Computer 
Serviuts Center* The current (August 1973) rate of utilization 
is about 150 Jobs per month* This usage reflects the size of the 
O enters (one to ten persons) and the work load they can handle* 



6. 



Use of the GPSDC System at NBS 
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Typical applications . The GPSDC system was designed 
for recording, in machine-readable form, typescript records that 
must be saved and referred to, but will not be published* Data 
evaluation groups and information centers produce such records in 
large quantities as they scan the literature in their fields, 
select some articles for re tent i on and then abstract and inde x 
them* Recording this information is the typical use of the 
GPSDC system* An example is shown in Figure 9* Because of the 
technical orientation of the centers, scientific notation is 
mandatory • 

There is a much larger class of records produced by any 
institution : file copies of administrative memoranda and 
actions* A simpler script and recording convention usually is 
sufficient* But administrative and technical records merge in 
documents such as annual reports on research accomplishments, 
written for managers but inevitably containing technical 
notation* Circumlocutions and awkward v spelling out' of 
symbols are characteristic marks of these documents (and of 
abstracts) written to fit within the limits of the simpler 
script s* 

6*4 Spec ial a ppl icq t ions . Several of these warrant 
detailed description* They are evidence supporting our contention 
that the printed scientific document can be prepared in GPSDC* 
The Bulletin of Thermodynamics and Thermochemistry [27] is pre- 
pared using the GPSDC system* Three groups (two outside NBS ) 
abstract and index current articles using a highly stylized form* 
These records are sent to NBS on punched paper tape, converted to 
GPSDC, returned for proofreading and then edited. The records 
are processed for publication by programs that construct a bib- 
liographic section and an index arranged by chemical formulas* 
(The programs interpret the fomulas written in normal scientific 
notation and assign the indexing sequence*) Since 1971 four 
sections of the Bulletin (organic substances, organic mixtures, 
inorganic substances and bibliography) have been photocopied 
from output from the NBS line printer. The 1970 inorganic 
substances section also was prepared in this way* In 1969, 
GPSDC records for this section were printed via a tape driven 
typewriter. A magnetic tape version of the 1971 Bulletin, 
coded in ASCII 1968, has been produced from GPSDC records and 
issued as NBS Magnetic Tape No* 4 [26]* 

Several books and Journal articles have been produced* 
"Tables of Molecular Vibration Frequencies", NSRDS-NBS 39 
[29], was keypunched at the University of Tokyo and the cards 
O it to NBS to be processed into GPSDC* Line printer copy was 
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BP I IF: IK/4569 Page 1 CHPLB- 1968-2 -143 CKIC/15405 

AUTH: Brennen, W., and Shane, E. C. 

TITLE: Pressure-Dependence of the Yellow Nitrogen 

Afterglow Intensity 
REF: Chem. Phys. Letters (Amsterdam) 1968 2 143 
REACT: N + N + M N 2 (a 3 * u + ) + M 

^(A 3 * u + ) + M ~> NgCB 3 ^) + M 

^(B 3 ^) + M -> N 2 + M 

M = 

^(B 3 ^) -> + h* 

N 2 (A 3 * U + ) + M N 2 + M 

INDEX: Experimental: gas: Bond cleaved NN: 

Bond formation NN: pressure: energy-transfer: 
fluorescence: excitation: quenching: rate: 
raditive: electronic: second -order : 
chemiluminescence: nitrogen-molecule (product): 

(a) 

FIGURE 9 

Data <?ef> ter Pecords . Indexing record prepared 
for the Chemical Kinetics Information Center, 
NBS. 

a. Input copy prepared on a Model 37 "Tele type" . 
Ciphers ( over strike combinations ) are used 

to encode Greek letters. 

b. Line printer copy of the same record. The 
ciphers have been interpreted. 




35 



V 
V 
9 
9 



c 




CVJ 


ro 


<* 


in 
















CVJ 


in 


CO 






^> 












1 

o 


1 


i 


i 


1 


1 


o 


ro 








ro 


• 


o 










• 












• 


in 










in 












in 












• 


o 










• 


M 










• 






C 






o 


o 








CVi 


in 












• 






0 






• 






I* 




CO 


• 












in 










o 














• 


ro 










• 


<* 










• 






0 






o 


1 












CVJ 








8 


• 


1 


• 


9 






• 


CO 




J* 






• 










u 


m 




• 


9 




9 


ro 




w 


JZ 




4* 


• 


1 








09 


• 


PQ 








6 


• 




9 


%i 




< 


o 




C 


0 




w 


ro 


« 


* 








• 


O 




« 


>> 


00 


• 




W 


u 


4* 


u 


e 






c 


•H 


9 






V 


9 


00 


4* 






c 


V 


c 


4* 






* 


c 


V 


9 




61 




9 


4* 






4 


m 


a 


c 




ro 


& 


• 


9 


M 


• 








Q 












1 




>> 






• 


9 


0 




9 


O 


c 


U 






tt 




9 








«J 


in 


c 


Q 




• 




<*• 


c 


00 


a 


8 






9 


9 


4* 


9 




* 




u 


VI 


«C 






n 


Pn 


<< 





CO CVJ 



CVJ CVJ 



ro 
ro 



~ cvj ro in so 



N o ro \0 & cvj 
ro «* * * <t in 



I I I I I I I I I 1 I 



♦ 



♦ 



* 

^ ro 
♦ PQ 

ro 55 

CVJ 

55 X 
t ♦ 



♦ 

CVJ 

t 

♦ 



> 
9 



U 
9 

03 9 

C 4* 

u u 

& g 

U H 

c u 



9 



♦ 

CVJ 

t 



c 

0 
CO 



8 

v 
o 
u 
a 



0 ^ 



» 0 
9 H 

a u 



• • •• H 



CVJ 8 
♦ 

t 



c 

0 



♦ ^ ♦ 

W cvj W 

« ro ro sc ro ro 

< PQ PQ «< 

CVJ CVJ CVJ CVJ 

JK 55 55 3 55 55 



* 4* 

4* 4 

8 
h 
0 

VI 



C 
0 

a vi 



1 

c 

« 

0 
u 

99 4* 

O H 
■H C 
0 
0 

u 



9 

4* U 

U C 

9 9 

F-> U 

© 00 

9 H 

^ i 

-i a 

u u 



■H 





•• 

b 


•• 


•• 

U3 




H 








M 


»J 




U 


w 


0 


M 


H 


H 




-< 


Q 


0 


a 


D 


M 


CD 


w 


55 


a 


PQ 


-< 


H 




a 


M 



9 

ERIC 



36 



proofread in Tokyo and then returned for correction of the GP SDC 
file and preparation for photo typese tting . An example of the 
final output is shown in Figure 10. A similar example, Figure 11 , 
is the preparation of tables of data on rates of reaction of 
electrons in water solution [ 30 ]• The input was from formatted 
typed tables (on paper tape), The figure demonstrates success- 
ful conversion to printing in which there are stylistic complex- 
ities. These examples and others and the techniques used are 
presented in more detail in reference [31 ]• 

The mult ivolume handbook "Crystal Data" [32] has been 
prepared for printing by using this system in a special 
manner. The original copy was keyboarded for one photo- 
composition machine ( Mergent haler Linofilm)* Later a 
decision was made to use another machine ( Mergenthaler 
Linotron) because a better printing and publication schedule 
could be obtained* Linofilm records are converted to 
GPSDC, proof copies made on the line printer and then the 
corrected cony processed to drive the Linotron* 

Conversion of other records from Linofilm coding has 
been used to advantage by the Chemical Kinetics Information 
Center* Several monographs on kinetics were keyboarded by 
the U.S. Government Printing Office as part of the normal 
publication process [33]* The machine-records were con- 
verted to GPSDC in order to add the contents of these hand- 
books to the magnetic tape file on kinetics* Proportional 
spacing is lost in the conversion, but virtually all symbols 
used by the compositors were converted properly* Probably, 
Monotype records could be converted in a similar manner* 

Two other examples of the use of "foreign" machine records 
should be mentioned* Memoranda, technical typescripts and 
bibliographies prepared using on-line text processing systems 
are converted to GPSDC for long term storage* The American 
Institute of Physics SPIN tapes, a current awareness service, 
are searched at NBS for several data centers* These tapes 
are ciphered to indicate upper and lower case, subscripts 
and superscripts and use names for Greek and special charac- 
ters* The retrieved material is converted in GPSDC, reformatted 
to match the desires of each center and printed in clear text* 



f>* 5 6ne gys t em for many users * All text handling tasks 
are very similar in the demands that they make on a system* 
This warrants a general approach in which modules of consider- 
able flexibility are invoked at the various stages of input, 
editing, reformatting, retrieval and printing* Flexibility 
is important* By this is meant ability to handle a large class 
Q f closely related variants of the same task* The special 
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Molecule: Bromoaeetylene HCCBr No. 149 

Symmetry (! /x Symmetry number o*= 1 



Sym. 
class 


\«,. 


Approximate 
t y pc* of mode 


Selected 
value of 
frequency 


Infrared 


Hainan 


Comments 


a' 
77 


V\ 

i'\ 

V\ 

V:% 


CH stretch 

C s C stretch 

CBr stretch 


3325 B 
2085 B 
618 C 
618 C 
295 B 


cm' 1 
(Gas) 
3325 VS 
2085 VS 
618 VS 
618 VS 
295 W 


cm" 1 


SF(^). 
SF (1*). 






CCBr dejz. deform 





References 

{il IK. W. J. Middle-ton and W. H. Sharkry, J. Am. Chrm. Sue. 81, »03 I1V.VJ). 
\2\ IK. W. S. Kirhards.m and J. H. Coldslrin, J. Chrm. I'hys. 18, 1314 (19601. 
[3 1 IK. C K. Hund and M. K. Wilson. J. Chrm. f'hvs. 34, 1301 (1%1). 



FIGURE 10 

Typoaraph ic Output » Photograph of a 

page from reference (32]. Copy was prepared 

using the GPSDC system and then trans I at ed 

to the code system of a photo composition 

machine. 
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Table 2. Reactions of e a q with water and transients from water 



No. 


Solute and Reaction 


pH 


Kdm 3 mol" 1 *- 1 ) 


Method 


Comments 


Ref. 


1.1 


H s O 

e M + HjO -5> H ♦ OH 


8.3-9.0 


(1.6 i 0.1) x 10* 


p.r. 


computer anal.; contains 7 x 
10 M n t . 


Hart.. 66-0015 






8.3 




p.r. 


k detd. at 5-81'C lo give 
£. = 4.5 ± 1 kcal mol"'. 


Ficl.67-0532 






11 


(2.2 * 0.6) x lo' 


p.r. 


contains Ba(OH), and 4 x 10" 3 M 
formate ion; extrapolated to 
formate concn. = 0. 


Swal68-0418 






" > 7 


2.7 x 10'(rd.) 


y-r. 


c.k.» assume k(e^ n + NO,~) * 

1.1 x 10 10 , soln. contains 

3 x 10"* M NaNO, and 5 x 10"* M 

glucose; pressures up to 8.85 

kbar. 


Hent.70-0056 


1.2 


D t O 

♦ D,0 -> D + OD" 


9.39 


1.25 £ 0.5 


p.r. 


computer anal., D,0 soln. satd. 
with D r 


Hart.68-0025 


1.3 






(6.5 ± 1.0) x 10* 


p.r. 




Dorf.63-0045 




4 » H, + 20H" 


13 


5 x 10* 


p.r. 




Cord... .63 -0050 






10.9 


(4.3 ± 0.8) x 10* 


p.r. 




Cord.... 63-0073 






13.3 


(5.5 ± 0.7) x 10* 


p.r. 


soln. in equil. with 100 
atm. H,. 


Math. 65-0009 






12 


(6.3 ± 1) x 10 f 


y-r. 


steady-state method, soln. 
H,-satd., method leas reliable, 
k detd. at 10 - 93"C to give 
E m = 5.2 ± 0.3 kcal mol -1 . 


Gott.67-0109 






1 1 


6 x 10 


f.phot. 


soln. H,-satd. 


Schm. 68-7143 






12.7 


5.0 x 10* (cor.) 


p.r. 


apparent change in k with pH 
has been obs. 


Brus70-0749 


1.4 


c * + c i **• D, + 20D" 


13.4 


6.0 x 10* 


p.r. 


computer anal., D,0 soln. 
contains 5.7 x 10" 3 M D,. 


Hart.68-0025 


1.5 


H 


10.9 


- 3 x 10 


p.r. 





Gord..v. 63-0073 




e;, + H -> H t + OH' 


10.5 


(2.5 ± 0.6) x 10 10 


p.r. 


soln. is in equil. with 


Math.65-0009 










100 atm. H,. 




1.6 


D 

ei ♦ D -> D, + OD" 


9.39 


(2.8 + 0.2) x 10 10 


p.r. 


soln. contains 4.5 x 10~ S 
M D, in D,0. 


Hart.68-0025 


1.7 


OH 


10.5 


(3.0 ± 0.7) x 10 10 


p.r. 


soln. contains only NaOH. 


Math.65-0009 




e;, + OH -> OH" 


11 


_ in 

3 x 10 


p.r. 




Gord....63-O0730 


18 


OD 

e^ ♦ OD -> OD" 


11.15 


(2.8 ± 0.2) x 10'° 


p.r. 


computer anal., D a O soln. of 
NaOD. 


Hart.68-0025 


1.9 


O" 

c M +0*»2 0H' 


13 


(2.2 * 0.6) x 10 10 


n r 

r ' * 


soln. in equil. with 50 atm. 
H Jt contains NaOH; not very 
reliable value. 


Math.65-0009 


1.10 


o; 

o;^> o|- 


11.1 


1.3 x 10 10 


p.r. 


d.k. at 650 nm(c M ); 
computer anal. 


Grue...71-0171 



Tabular data . Typeset material from reference 
[30]. Original copy was prepared on a punched 
paper tape typewriter in essentially the same 
format us ing half* line spacing for super sc ri pt s 
and subscripts and underlining to indicate 
italics. Changes in type size and font, use 
of Inferiors and superiors and of rules were 
introduced by editing the GPSBC copy* 
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applications presented by users of the GPSDC system have had 
a major impact on design of modules* Each application has 
revealed desires (or demands) that reflect the user's wish 
to invoke the capabilities of the total publishing process, 
human and machine, as opposed to printing* The users have 
agreed to a general approach* They know that their next Job 
will require a variant* They also have learned that special 
programming on their part will be minimized* 

The modular structure of the system is based on a clear 
separation between devices, which may use any code formalism, 
and archival representation in GPSDC* As soon as possible in 
the work flow, the code stream from an input device is converted 
to GPSDC* All processing is done on the GPSDC form* As late 
as possible the records are converted for use on a specific 
output device* Another design criterion is that individual 
devices should be treated as members of a class the best of 
which is slightly more powerful than any known members* 

6ne result of this general approach is that all of the 
input from typewriters is interpreted by one program, al - 
though six different types of machines have been used* Also, 
records written originally in all capitals have been "marked 
up" by textual substitution routines and then passed through 
the same program* 

All editing is done with one program package that operates 
on GPSDC records* Interestingly enough, several parts of 
this package are simply modifications of programs built by 
others to operate on ordinary binary coded decimal (BCD) 
records* The same is true for reformatting and information 
re tr ieva I * 

^ a 6 General remarks * It has become clear to us from this ex- 
perience that the expanded code used in GPSDC presents no bar to 
the development of a very extensive and flexible text-handling 
system* Input can be accepted from a wide variety of sources* 
Very different material can be edited and reformatted by common 
programs* Any output device appears to be accessible* It 
is also clear that an automated text processing system can 
be written in a high level language, and be written by many 
ha nds • 

It is this experience that makes us confident that this or 
a similar system, based on standard codes, can be used 
profitably by data centers, both for their internal operations 
and in cooperation with each other* 

Our experience also causes us to suggest that formal and 
informal associations of data compilers and users should increase 

level of their interaction with the groups formally charged 
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with the task of developing standards for automated data proces- 
sing* We are particularly sensitive to the need for development 
of flexible character string and character array processing 
facilities in machine independent languages at the F0RTRAN and 
C6B6L level. It seems to us that the broad adoption of a bexsic 
standard man-machine alphabet should make it possible for the 
developers of machine independent languages to assume a standard 
man-machine alphabet for data representation, However, in making 
this suggestion we must emphasize that at present GPSDC has no 
formal standing as a USA Federal, USA National or International 
standard. As of today GPSDC is part of an experiment* Its present 
users are aware that much of the GPSDC system is based on 
proposed, not formally adopted, standards* We need more exper- 
ience with broader classes of users, particularly in connection 
with diagrams, before we will be completely confident about the 
general validity of many of our ideas* In addition, we must re- 
emphasize that this paper was restricted, primarily, to discus- 
sions of the extension of the graphic character repertory, A 
major task ahead is the specification of additional controls 
and the development of revised prescriptions for recording data 
files on magnetic tape intended for interchange and dissemination 
as publications. 

If GPSDC or any alternatives are to become formally recognized 
automated data processing standards, a significantly large body of 
users must exist and the prescriptions of their standards must be 
made known through proper channels* A number of potential chan- 
nels exist* 

Within the National Bureau of Standards the Institute for 
Computer Sciences and Technology has the formal responsibility for 
leading the development of standards for automated data proces- 
sing. As a partial discharge of this responsibility it maintains 
an index [34] which sets forth an outline of the ZS6, ANSI and 
USA Federal efforts in developing standards for automated data 
processing. This index provides a good starting point for people 
interested in learning more about the formal processes of stan- 
dards development. 



The work reported here has been a continuing effort at 
synthesis under the restraints of developing standards* Some 
very important sources of certain of the ideas and concepts 
included in the synthesis have not been cited directly. It is 
a pleasure to acknowledge the benefits we derived from our 
r ""0" ot tne work of Feldman [35], Klerer [36], Mullen [37] 
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